Towards a new Approach for Arabic root extraction: Exploit relations between the word letters and their placement in the word for Arabic root extraction
نویسنده
چکیده
This paper presents a new root-extraction approach for Arabic words. The approach tries to assign for Arabic words a unique root without relying on a database of word roots, a list of word patterns or a list of all the prefixes and the suffixes of the Arabic words. Unlike most of Arabic rule-based stemmers, it tries to predict the root-letters positions one by one based on some rules and relations among the word letters and their placement in the word. This paper focuses on two parts of the approach. The first one introduces some rules to distinguish between the Arabic definite article (È@āl ) and the permanent component (È@āl ) that may found in any Arabic word. The second one classifies Arabic letters in to groups according to their positions in the word. The proposed approach is a system composed of several modules used to extract the word root. The approach has been evaluated using the Holy Quran words. The evaluation results show a promising root extraction algorithm.
منابع مشابه
Rule-based Approach for Arabic Root Extraction: New Rules to Directly Extract Roots of Arabic Words
Extracting word roots in Arabic language is very problematic due to the specific morphological and structural changes in the language. To address this problem, several techniques have been proposed. This paper continues the problem of identifying and exploiting relationship amongst Arabic letters for Arabic root extraction begun in [1]. Eight different rules that detect the root letters accordi...
متن کاملExtracting the roots of Arabic words without removing affixes
Most research in Arabic roots extraction focuses on removing affixes from Arabic words. This process adds processing overhead and may remove non-affix letters, which leads to the extraction of incorrect roots. This paper advises a new approach to dealing with this issue by introducing a new algorithm for extracting Arabic words’ roots. The proposed algorithm, which is called the Word Substring ...
متن کاملRepresentation of Arabic Words - An Approach Towards Probabilistic Root-Pattern Relationships
In the traditional Arabic NLP a root-pattern relationship has generally been considered as a simple relationship, whereas the potential aspect of considering it as a statistical measure has extensively been neglected and even never formally considered. This paper attempts therefore to explore some issues involved in considering the classical phenomenon of Arabic root-pattern relationships as pr...
متن کاملA Markovian approach for arabic root extraction
In this paper, we present an Arabic morphological analysis system that assigns, for each word of an unvoweled Arabic sentence, a unique root depending on the context. The proposed system is composed of two modules. The first one consists of an analysis out of context. In this module, we segment each word of the sentence into its elementary morphological units in order to identify its possible r...
متن کاملAn Approach for Arabic Root Generating and Lexicon Development
This paper presents a novel approach for Arabic root generation and lexicon development. The approach provides three algorithms; in the first algorithm Arabic word root is generated using the concept of permutation and combination, the root generator algorithm generates roots by applying permutations to the Arabic alphabetic letters. Then, the second algorithm is used for developing difference ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Science (AGH)
دوره 14 شماره
صفحات -
تاریخ انتشار 2013